Counterfactual equivalence for POMDPs, and underlying deterministic environments

نویسنده

  • Stuart Armstrong
چکیده

Partially Observable Markov Decision Processes (POMDPs) are rich environments often used in machine learning. But the issue of information and causal structures in POMDPs has been relatively little studied. This paper presents the concepts of equivalent and counterfactually equivalent POMDPs, where agents cannot distinguish which environment they are in though any observations and actions. It shows that any POMDP is counterfactually equivalent, for any finite number of turns, to a deterministic POMDP with all uncertainty concentrated into the initial state. This allows a better understanding of POMDP uncertainty, information, and learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proposal for an Algorithm to Improve a Rational Policy in POMDPs

Reinforcement learning is a kind of machine learning. Partially Observable Markov Decision Process (POMDP) is a representative class of non-Markovian environments in reinforcemnet learning. We know the Rational Policy Making algorithm (RPM) to learn a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore ...

متن کامل

Feature Reinforcement Learning using Looping Suffix Trees

There has recently been much interest in history-based methods using suffix trees to solve POMDPs. However, these suffix trees cannot efficiently represent environments that have long-term dependencies. We extend the recently introduced CTΦMDP algorithm to the space of looping suffix trees which have previously only been used in solving deterministic POMDPs. The resulting algorithm replicates r...

متن کامل

Deterministic POMDPs Revisited

We study a subclass of POMDPs, called Deterministic POMDPs, that is characterized by deterministic actions and observations. These models do not provide the same generality of POMDPs yet they capture a number of interesting and challenging problems, and permit more efficient algorithms. Indeed, some of the recent work in planning is built around such assumptions mainly by the quest of amenable ...

متن کامل

BCK-ALGEBRAS AND HYPER BCK-ALGEBRAS INDUCED BY A DETERMINISTIC FINITE AUTOMATON

In this note first we define a BCK‐algebra on the states of a deterministic finite automaton. Then we show that it is a BCK‐algebra with condition (S) and also it is a positive implicative BCK‐algebra. Then we find some quotient BCK‐algebras of it. After that we introduce a hyper BCK‐algebra on the set of all equivalence classes of an equivalence relation on the states of a deterministic finite...

متن کامل

Quasi-Deterministic Partially Observable Markov Decision Processes

We study a subclass of POMDPs, called quasi-deterministic POMDPs (QDET-POMDPs), characterized by deterministic actions and stochastic observations. While this framework does not model the same general problems as POMDPs, they still capture a number of interesting and challenging problems and, in some cases, have interesting properties. By studying the observability available in this subclass, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.03737  شماره 

صفحات  -

تاریخ انتشار 2018